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Abstract 



We generalize the popular ensemble Kalman filter to an ensemble transform filter 
^ where the prior distribution can take the form of a Gaussian mixture or a Gaussian kernel 

density estimator. The design of the filter is based on a continuous formulation of the 
Bayesian filter analysis step. We call the new filter algorithm the ensemble Gaussian mix- 
I— — I ture filter (EGMF). The EGMF is implemented for three simple test problems (Brownian 

dynamics in one dimension, Langevin dynamics in two dimensions, and the three dimen- 
sional Lorenz-63 model). It is demonstrated that the EGMF is capable to track systems 
with non-Gaussian uni- and multimodal ensemble distributions. 

s 

' — ' 1 Introduction 
cn 

J> We consider dynamical models given in the form of ordinary differential equations (ODEs) 
O i = /(x,t) (1) 

rn 

^ with state variable x G M^. Initial conditions at time to are not precisely known and are treated 
O as a random variable instead, i.e., we assume that 

^ where 7ro(x) denotes a given probability density function (PDF). The solution of ^ at time t 
with initial condition xq at to is denoted by x(t; to, xq). 

The evolution of the initial PDF ttq under the ODE ([T| up to a time t > to is provided by 
the continuity equation 

^ = -Vx-(7r/), (2) 



which is also called Liouville's equation in the statistical mechanics literature (Gardiner, 2004). 
Let us denote the solution of Liouville's equation at observation time t by 7r(x, t). In other 
words, solutions x(t;to,Xo) with Xo ~ tto constitute a random variable with PDF vr(-,t). 

For a chaotic ODE ([T|, i.e. for an ODE with positive Lyapunov exponents, the PDF 7r(-,t) 
will be spread out over the whole chaotic attractor for t — )■ oo. This in turn implies a lim- 
ited solution predictability in the sense that the time-evolved PDF will become increasingly 
independent of the initial PDF ttq. Furthermore, even if the initial PDF is nearly Gaussian 
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with mean xq and small covariance matrix P, the solution x(t;to,xo) will become increasingly 
unrepresentative of the expectation value of the underlying random variable it is supposed to 
represent. 

To counteract the divergence of nearby trajectories under chaotic dynamics, we assume that 
we have uncorrelated measurements yobs(^j) ^ at times tj, j > 1 with measurement error 
covariance matrix R G M^^^, i.e. 



yobs(t,) -Hx(t,) ~N(0,R) 



(3) 



where the notation N(y, B) is used to denote a normal distribution in y G with mean y and 
covariance matrix B G M.^^^ . The matrix H G M^^^ is called the forward operator. The task 
of combining solutions to with in termittent measurements ([s]) is called data assimilation in 
the geophysical literature (Evensen, 2006) and filtering in the statistical literature (Bain and 



Crisan 2009). 



A first step to perform data assimilation for nonlinear ODEs ([T]) is to approximate solutions 
to the associated Liouville equation In this paper, we rely exclusively on particle methods 
(Bain and Crisan, 2009) for which Liouville's equation is naturally approximated by the evolv- 
ing empirical measure. More precisely, particle or ensemble filters rely on the simultaneous 
propagation of M independent solutions Xj(t) G M^, i = 1, . . . ,M, of ([T| (Evensen, 2006). We 
associate the empirical measure 



M 



7rem(x,t) = ^7i5(x - Xi(t)) 



(4) 



1=1 



with weights 7i > satisfying 



M 



4 = 1 



Here 6{-) denotes the Dirac delta function. Hence our statistical model is given by the empirical 
measure ^ and is parametrized by the particle weights {%} and the particle locations {xj}. In 
the absence of measurements, the empirical measure vTcm with constant weights 7^ is an exact 
(weak) solution to Liouville's equation ^ provided the Xj()f:)'s are solutions to the ODE ([T]). 
Optimal statistical efficiency is achieved with equal particle weights 7^ = 1/M. 

The assimilation of a measurement at tj leads via Bayes' theorem to a discontinuous change 
in the statistical model (|4]). Sequential Monte Carlo methods (Bain and Crisan, 2009) are 



primarily based on a discontinuous change in the weight factors 7,. To avoid a subsequent 
degeneracy in the particle weights one re-samples or uses other techniques which essentially 



lead to a redistribution of particle positions Xj. See, for example. Bain and Crisan (2009) for 



more details. The ensemble Kalman filter (EnKF) relies on the alternative idea to replace the 
empirical measure (|4]) by a Gaussian PDF prior to an assimilation step (Evensen, 2006). This 
approach allows for the application of the Kalman analysis formulas to the ensemble mean and 
covariance matrix. The final step of an EnKF is the re-interpretation of the Kalman analysis 
step in terms of modified particle positions while the weights are held constant at 7^ = 1/M. 
We call filter algorithms that rely on modified particle/ensemble positions and fixed particle 
weights ensemble transform filters. A new ensemble transform filter has recently been proposed 
by Anderson (2010). The filter is based on an appropriate transformation step in observation 
space and subsequent linear regression of the transformation onto the full state space. The 
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approach developed in this paper rehes instead on a general methodology for deriving ensemble 
transform filters as proposed by Reich (2011). See Section |2| below for a summary. The same 



methodology has been developed for continuous-in-time observations by Crisan and Xiong 



(2010). In this paper, we demonstrate how our ensemble transform filter framework can be 
used to generalize EnKFs to Gaussian mixture models and Gaussian kernel density estimators. 
The essential steps are summarized in Section [3] while an algorithmic summary of the proposed 
ensemble Gaussian mixture filter (EGMF) is provided in Section |4j The EGMF can also 
be viewed as a generalization of the continuous formulation of ensemble square root filters 
(Tippett et al. , 2003) as provided by Bergemann and Reich] ( |2010a||b| ) and the EnKF with 
perturbed observations, as demonstrated by Reich (2011). The paper concludes with three 
numerical examples in Section [5j We first demonstrate the properties of the newly proposed 
EGMF for one-dimensional Brownian dynamics under a double-well potential. This simulation 
is extended to the associated two-dimensional Langevin dynamics model with only velocities 
being observed. Finally we consider the three variable model of Lorenz (1963). 

We mention that alternative extensions of the EnKF to Gaussian mixtures have recently 
been proposed, for example, by Smith (2007), Stordal et al. (2011), and Frei and Kiinsch (2011). 
However, while the cluster EnKF of Smith (2007) is an example of an ensemble transform filter, 
it fits the posterior (analysed) ensemble distribution back to a single Gaussian PDF and, hence. 



only partially works with a Gaussian mixture. Both the mixture ensemble Kalman filter of Frei 



and Kiinsch (2011 ) and the adaptive Gaussian mixture filter of Stordal et al. (2011 ) approximate 



the model uncertainty by a sum of Gaussian kernels and utilize the ensemble Kalman filter as a 
particle update step under a single Gaussian kernel. Resampling or a re-weighting of particles 
is required to avoid a degeneracy of particle weights due to changing kernel weights. A related 
filter algorithm based on Gaussian kernel density estimators has previously been considered by 



Anderson and Anderson (1999). 



2 A general framework for ensemble transform filters 



Bayes' formula can be interpreted as a discontinuous change of a forecast PDF ttj into an 
analyzed PDF tTq at each observation time tj. On the other hand, one can find a continuous 
embedding 7r(x, s) with respect to a fictitious time s G [0,1] such that vr(-,0) = vr/ and vTa = 
7r(-,l). As proposed by Reich (2011), this embedding can be viewed as being induced by a 



continuity (Liouville) equation 



ds 



-Vx ■ (ng) 



(5) 



for an appropriate vector field (/(x, s) G M . The vector field g is not uniquely determined for a 
given continuous embedding vr(-, s) unless we also require that it is the minimizer of the kinetic 
energy 

T{v) = ^ j dnv^Mv 

over all admissible vector fields v G L^(d7r, M^), where M G M^^^ is a positive definite mass 
matrix (Villani, 2003). Admissibility means that g = v satisfies ^ for given vr and dn/ds. 



Under these assumptions, a constrained variational principle ( Villani[ 2003) implies that 
the desired vector field is given hj g = M~^Vx'V^, where the potential ip{'x,s) is the solution of 
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the elliptic partial differential equation (PDE) 

Vx • (ttM-Vx^) =7r{S- E^[S]) (6) 
for given PDF tt, mass matrix M, and negative log-likelihood function 

5(x; yobs(t,)) = ^(Hx - yobs(ti))'^R-'(Hx - yobs(t,)). (7) 

Here ^Ttlf] denotes the expectation value of a function /(x) with respect to a PDF vr(x). We 
finally replace ([s]) by 

— = -Vx ■ (ttM^Vx^) (8) 
with an underlying ODE formulation 

^ = M-VxV^(x,s) (9) 

in fictitious time s e [0, 1]. As for the ODE ([T]) and its associated Liouville equation ([2]), we 
may approximate ^ and its associated Liouville equation ^ by an empirical measure of type 
Q. Furthermore, one and the same empirical measure approximation can now be used for 
both the ensemble propagation step under the model dynamics ([T| and the data assimilation 
step ([s]) using constant and equal weights 7^ = 1/M. The particle filter approximation is closed 
by finding an appropriate numerical solution to the elliptic PDE (|6]). This is the crucial step 
which will lead to different ensemble transform filter algorithms. 

The basic numerical approach to the data assimilation step within an ensemble transform 
filter formulation consists then of the following sequence of steps, (i) Given a current ensemble 
of solutions Xj(s), i = 1, . . . , M, one fits a statistical model 7f(x, s). (ii) Solve the elliptic PDE 

V^-{7rg) = 7r{S-E^[S]) (10) 

for a vector field ^(x, s). The solution is not uniquely determined and an appropriate choice 
needs to be made. See the discussion above, (iii) Propagate the ensemble members under the 
ODE ^ 

^=?(xi,s). (11) 
We assume that a forecast ensemble of M members Xj e M^, i = 1, . . . , M, is available at an 



observation time tj which provides the initial conditions for the ODE (11). Solutions at s = 1 
yield the analyzed ensemble members, which are then used as the new initial conditions for ([T]) 
at time t = tj and ([T]) is solved over [tj,tj+i] up to the next observation point. 

If the statistical model is a Gaussian with mean x G and covariance matrix P G M^^^, 
then the outlined approach leads to a continuous formulation of the ensemble square-root 
ensemble filter analysis step at time tj (Bergemann and Reich, 2010a|[b ), i.e. 



dx- 1 

= --PH^R-' (Hx, + Hx - 2yobs(t,)) (12) 



for s G [0, 1]. It follows that M = P"^ and 



V;(x) = (Hx + Hx - 2yobs(t,))^ R"' (Hx + Hx - 2yobs(ti)) • (13) 
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3 An ensemble transform filter based on Gaussian mix- 
ture statistical models 

We now develop an ensemble transform filter algorithm based on a L > 1 component Gaussian 
mixture model, i.e. 



vr X 



E 



Oil 



(2vr)A'/2detP;/^ 



exp 



(-^(X-XO^P; ^(X-X^j = "/^Gauss/x) 
^ ^ 1=1 



(14) 



where vrGauss,/(x) denotes the normal distribution in x G with mean x^ and covariance matrix 
P;. The Gaussian mixture parameters, i.e. ai,xz, P/, / = 1, . . . , L, need to be determined from 
the ensemble Xj, i = 1, . . . , M, in an appropriate manner using, for example, the expectation- 



maximization (EM) algorithm (Dempster et al. , 1977 Smith, 2007). See Section 3.3 for more 
details. Note that Yld=i (^i = ^ and ai > 0. To simplify notation, we write tt/ instead of vrcauss,/ 
from now on. 

An implementation of the associated continuous formulation of the Bayesian analysis step 
proceeds as follows. To simplify the discussion we derive our filter formulation for a scalar 
observation variable, i.e. K = 1, Uohsitj) — Hx(tj) ~ N(0,i?), and 



1 2 
5(x; yohsitj)) = — (2/obs(ij) - Hx) 



(15) 



The vector-valued case can be treated accordingly provided the error covariance matrix R is 
diagonal. We first decompose the vector field ^(x, s) G in (11) into two contributions, i.e. 



dx 

d^ 



^(x, s) = ma(x, s) + mb(x, s). 



(16) 



To simplify notation we drop the explicit s dependence in the following calculations. We next 



decompose the right hand side of the elliptic PDE (10) also into two contributions 

7f (^(x) -E^[^]) = |5^«,7r,(5(x) -E^J^]) 1 + (E., -E^[5]) |> . (17) 



1=1 



1=1 



We now derive explicit expressions for ua(x) and mb(u). 

3.1 Gaussian mixture Kalman filter contributions 

We define the vector field ma(x) through the equation 

L 



tiA(Xj 



1=1 



TT X 



PzVx^A,z(x), 



(18) 



together with 

Vx • {^r(x)^iA(x)} = Vx ■ <j 5^ai7rKx)PiVxV'A,Kx) |> = ^ az7rKx)(5(x) - E^JS]) (19) 



1=1 



1=1 
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which imphes that the potentials ipA,i{'^), I = I, ■ ■ ■ , L, are uniquely determined by 

Vx ■ {7rKx)P,VxV'A,Kx)} = vrKx)(5(x) - E^J5]) (20) 
for all / = 1, . . . , L. It follows that the potentials iIja,i{'^) are equivalent to the ensemble Kalman 



filter potentials for the l-th Gaussian component. Hence, using (12) and (18), we obtain 

L 

i , 

ua(x, s) 



1 ^ 



Piis)U^R-' [Hx(s) + Hx^s) - 2yobs(t,)] • (21) 



2^ 7r[^,s) 



3.2 Gaussian mixture exchange contributions 

The remaining contributions for solving ^ are collected in the vector field 

^ ai7ii{x.) 

Mb(x) = } ^ P;Vx^bAx), 



which therefore needs to satisfy 



1=1 



and, hence, we may set 



Vx • {7rKx)PzVxV'B,Kx)} = vrKx)(E.JS] - E^[^]) 



for all / = 1, . . . , L. To find a solution of (24) we introduce functions -^b,; such that 



iPbA^) = ^B,/(Hx - HXi) = ^B,Kz/ - Vl) 
with ?/ := Hx and yi = Hx/. Now the right hand side of (p4j) gives rise to 

Ai^W/ - N , TTT1 TTrd^V': 



Vx ■ {7rKx)PzVx^B,Kx)} = TT^x) ( -{y - vd^iv - Vi) + HP^H 



and (24) simplifies further to the scalar PDE 



-(y-yJ^(2/-y/) + HP,H 
dy 



dy2 



(2/-y,)=E.J5]-E^[5]. 



The PDE (27) can be solved for 



/(^) = ^(2/-y/) 



z = y-yi, 



under the condition /(O) = by explicit quadrature and one obtains 

1 E.. [s] - E^[s] erf {(y - Vi)l^\ 



fiy yi)-l ^'hp,h 



(22) 



Vx ■ {7r(x)nB(x)} = Vx ■ < az7rKx)P,Vx^B,/(x) \ = a,7rKx)(E.JS] - E^[5]) (23) 



(24) 



(25) 



{y-yi)\ (26) 



(27) 



(28) 



(29) 
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with marginalized PDF 



(7/ = y/HPiH^, and the standard error function 



o2 



Note that 



eTi{y) = ^ / e-^'ds. (31) 



1 d 

2dy 

We finally obtain the expression 



eTi((,y-yi)/^^=My). (32) 



1 ' oK^Mx^p ^^ E^, [s]{s) - E,[s]{s) {(y - yi)/V^i 



for the vector field mb(x, s). 

The expectation values E^JS], I = 1, . . . , L, can be computed analytically, i.e. 

K,[S] = ^{iyoUtj)-yir + crf) (34) 

or estimated numerically. Recall that ^ = 1 and, therefore, 

L 

E^[S] = Y,(^i^n,[S]. (35) 

/=i 

It should be kept in mind that the Gaussian mixture parameters ai,5li,Pi can be updated 
directly using the differential equations 

^ = -PiH^R-\H^i-yoUtj)), (36) 

^ = -P,H^i2-iHP,, (37) 

^ = -^ai{{Il^i-yoUtj)fR'\ii^i-yohs{t,)) + X}, (38) 
for I = 1, . . . , L. Here A G M is chosen such that 

da; , , 

Furthermore, ua(x, s) exactly mirrors the update of the Gaussian mixture means x; and co- 
variance matrices P^, while mb(x, s) mimics the update of the weight factors ai by rearranging 
the particle positions accordingly. 

As already eluded to, we can treat each uncorrelated observation separately and sum the 
individual contributions in ma(x, s) and mb(x, s), respectively, to obtain the desired total vector 
field (piel). 
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3.3 Implementation aspects 

Given a set of ensemble members Xj, z = 1, . . . , M, there are several options for implementing 
a Gaussian mixture filter. The trivial case L = 1 leads back to the continuous formulations 
of Bergemann and Reich ( 2010a||b ). More interestingly, one can chose L <^ M and estimate 



the mean and the covariance matrices for the Gaussian mixture model using the EM algorithm 



(Dempster et al. , 1977; Smith, 2007). The EM algorithm self-consistently computes the mixture 



mean x; and covariance matrix P; via 



M 



M 



Eft 



X/l (Xi - X; 



(40) 



for I 



L using weights f3i^i defined by 



azTTzlXj 



ai 



1 ^ 

i=l 



(41) 



The EM algorithm can fail to converge and a possible remedy is to add a constant contribution 



51 to the empirical covariance matrix in (40) with the parameter S > appropriately chosen. 



We mention that more refined implementations of the EM algorithm, such as those discussed 



by Fraley and Raftery (2007), could also be considered. It is also possible to select the number 



of mixture components adaptively. See, for example. Smith (2007). The resulting vector fields 
for the ith ensemble member are given by 



1 ^ 

- E A/s)PKs)H^/?-i [Hxi(s) + Hx^s) - 2|/obs(t,)] 



(42) 



1=1 



and, using (33) 



(43) 



with weights (3i^i given by (41). 



Another option to implement an EGMF is to set the number of mixture components equal 
to the number of ensemble members, i.e. L = M, and to use a prescribed covariance matrix B 
for all mixture components, i.e. = B and x; = x;, / = 1, . . . , L. We also give all mixture 



components equal weights ai = 1/M. In this setting, it is more appropriate to call (14) a kernel 



density estimator (Wand and Jones, 1995). Then 



1 ^ 



(44) 



1=1 



and 



MBlXi, Si 



1 ^ 



s)BH 



^E^J5](s)-E^[5](s: 



err 



1=1 



HBH 



T 



{y-yi)/^f 



(45) 
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with weights Pi^i given by (41), o"/ = VHBH^, and ai = 1/M. The Kalman filter hke con- 



tributions (44) can be replaced by a formulation with perturbed observations (Evensen, 2006 



Reich 2011) which yields 



MA(xj, s) = -BH R [Hxi(s) - ?/obs(ij) + di 



(46) 



where G M, z = 1, . . . , m, are independent, identically distributed Gaussian random numbers 
with mean zero and variance R. A particular choice is B = cP, where P is the empirical 
covariance matrix of the ensemble and c > is an appropriate constant. Assuming that the 
underlying probability density is Gaussian with covariance P, the choice 



(2/(Ar + 2))4/(^+4)M-2/(^ 



+4) 



(47) 



is optimal for large ensemble sizes M in the sense of kernel density estimation (see, e.g. Wand 



and Jones (1995)). Recall that N denotes the dimension of phase space. The resulting filter is 



then similar in spirit to the rank histogram filter (RHF) suggested by Anderson (2010) with the 
RHF increments in observation space being replaced by those generated from a Gaussian kernel 
density estimator. Another choice is B P in which case (45) can be viewed as a correction 



term to the standard ensemble Kalman filter (46). We will explore the kernel estimator in the 



numerical experiment of Section 5.4 



Note that localization, as introduced by Houtekamer and Mitchell (2001) and Hamill et al. 



(2001), can be combined with (42)-(43) and (44)-(45), respectively, as outlined in Bergemann 



and Reich (2010a). For example, one could set the covariance matrix B in (|44])-(45) equal 
to the localized ensemble covariance matrix. Localization will be essential for a successful 
application of the proposed filter formulations to high-dimensional systems ([T|. The same 
applies to ensemble infiation (Anderson and Anderson, 1999). 



We also note that the computation of the particle-mixture weight factors ( |41[ ) can be become 
expensive since it requires the computation of P^"^. This can be avoided by either using only 



the diagonal part of P; in 7ri(xj) (Smith, 2007) or by using a marginalized density such as (30), 
i.e. 7ii{yi), Hi := Hxj, instead of the full Gaussian PDF values TTi{-Xi). Some other suitable 
marginalization could also be performed. 

The vector field ub(x, s) is responsible for a transfer of particles between different mixture 
components according to the observation adjusted relative weight ai of each mixture compo- 
nent. These transitions can be rather rapid implying that ||mb(x, s)||oo can become large in 
magnitude. This can pose numerical difficulties and we eliminated those by limiting the /qo- 



norm of Mb(x, s) through a cut-off value Mcut- Alternatively, we might want to replace (30) by 
a PDF which leads to less stiff contributions to the vector field mb(x, s) such as the Student's 



t-distributions (Schaefer, 1997). Hence a natural approximative PDF is provided by the the 
scaled t-distribution with three degrees of freedom, i.e. 



2a' 



{y-yf 



(48) 



We also introduce the shorthand 4>i{y) = (p{y; yi, <Ji) with yi = Hx; and ai = a/HP/H^. A first 
observation is that 



E, 



<f>i\ 



yu 



^<p\{y-yif] = cj^ 



(49) 
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i.e., the first two moments of (pi matcli tliose of (30). Tlie second observation is tliat 0/ can be 
integrated explicitly, i.e. 



Vl 



(f)i{u)du = — arctan ( — — — 

71 V ^« 



(^i jy - yi) 
(^1 + {y- my 



Hence the relation (32) suggests the alternative formulation 



(50) 



mb(x, s 



1=1 



vr X, s 



E^jg](g)-E^[g](s) Hy,s)) 



(51) 



where 0/(?/, s) = (f){y] yi{s), cri{s)). 

The differential equation ( [l6| ) needs to be approximated by a numerical time-stepping pro- 
cedure. In this paper, we use the forward Euler method for simplicity. However, the limited 
region of stability of an explicit method such as forward Euler implies that the step-size As 
needs to be made sufficiently small. This issue has been investigated by Amezcua et al. ( |2011 ) 
for the formulation with L = 1 (standard continuous ensemble Kalman filter formulation) and 
a diagonally implicit scheme has been proposed which overcomes the stability restrictions of 
the forward Euler method while introducing negligible computational overhead. The computa- 



tional cost of a single evaluation of ( 16 ) for given mixture components should be slightly lower 
than for a single EnKF step since no matrix inversion is required. Additional expenses arise 
from fitting the mixture components (e.g. using the EM algorithm) and from having to use a 
number of time-steps 1/As > 1. 



4 Algorithmic summary of the ensemble Gaussian mix- 
ture filter (EGMF) 

Since the proposed methodology for treating nonlinear filter problems is based on an extension 
of the EnKF approach, we call the new filter the ensemble Gaussian mixture filter (EGMF). 
We now provide an algorithmic summary. 

First a set of M ensemble members Xj(0) is generated at time to according to the initial 
PDF ttq. 

In between observations, the ensemble members are propagated under the ODE model ([TJ, 

i.e. 

ii = /(xi,t), (52) 

for i = 1, . . . , M and t G t-,]. 



At an observation time tj, a Gaussian mixture model (14) is fitted to the ensemble members 



1, . . . , M. One can, for example, use the classic EM algorithm (Dempster et al. , 1977 



Smith 2007) for this purpose. In this paper we use a simple heuristic to determine the number 



of components L G {1,2}. An adaptive selection of L is, however, feasible (see, e.g.. Smith 



(2007)). Alternatively, one can set L = M and implement the EGMF with a Gaussian kernel 
density estimator with = x^, = 1/M. The covariance matrix B can be based on the 
empirical covariance matrix P of the whole ensemble via B = cP with the constant c > 
appropriately chosen. At this stage covariance localization can also be applied. 
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Figure 1: Displayed are the prior distribution, the hkehhood from a measurement and the 
resulting posterior distribution. The prior as well as the posterior are bimodal Gaussian. 



The vector fields ma(x, s) and mb(x, s) are computed according to (42 ) and (43 ), respectively, 
(or, alternatively, use ([45])- ([46])) for each independent observation and the resulting contribu- 
tions are summed up to a total vector field ^(x, s). Next the ensemble members are updated 

= 1, 



according to ( 16 ) for x = Xj 



M. Here we use a simple forward Euler discretization 



with step-size As chosen sufficiently small. After each time-step the Gaussian mixture com- 
ponents are updated, if necessary, using the EM algorithm. The analyzed ensemble members 



Xj(tj) are obtained after 1/As time-steps as the numerical solutions at s = 
new initial conditions for (52) with time t now in the interval 

Ensemble induced estimates for the expectation value of a function /(x 

via 

1 ^ 



1 and provide the 
can be computed 

(53) 



without reference to a Gaussian mixture model. 



5 Numerical experiments 

In this section we provide results from several numerical simulations and demonstrate the per- 
formance of the proposed EGMF in comparison with standard implementations of the EnKF 



and an implementation of the RHF (Anderson, 2010). We first investigate the Bayesian assim- 
ilation step without model equations. 

5.1 Single Bayesian assimilation step 

We test our formulation first for a single assimilation step where the prior is a bimodal Gaussian 
and the likelihood is 

7r(yobs|x) = -L^e-(---)'/^l (55) 
V 27r4 
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Figure 2: Numerically obtained posterior for ensemble sizes M = 50 (left panel) and M = 2000 
(right panel). Shown are results from the EGMF, the RHF, and an EnKF analysis step. While 
the EGMF and the RHF converge to the correct posterior distribution, the EnKF leads to 
qualitatively incorrect results for both ensemble sizes. 



particle dynamics under EGMF assimilation step 




-6 



0.2 0.4 0.6 0.8 1 

ficticious time s 

Figure 3: Displayed is the rearrangement of the particles under the dynamics of the EGMF 
analysis step. Rapid transitions between the Gaussian mixture components are induced by the 
vector field ub- 
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Figure 4: Shown is the potential energy V used in both the Brownian and Langevin dynamics 
model. 



Reference trajectory 
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Figure 5: Shown is the reference solution from which observations are generated by adding 
Gaussian noise with mean zero and variance R. 



The posterior distribution is again bimodal Gaussian and can be computed analytically. See 
Fig. [T] Here we demonstrate how an EnKF, the RHF, and the proposed EGMF approximate 
the posterior for ensemble sizes M = 50, 2000 and for Xj(0) ~ TTpnor, « = !,..., M. See Fig. [2] 
Both the RHF and the EGMF are capable of reproducing the Bayesian assimilation step 
correctly for M sufficiently large while the EnKF leads to a qualitatively incorrect result. The 



transformation of the ensemble members (particles) under the dynamics (16) is displayed in 
Fig-i 

We now present results from three increasingly complex filtering problems. 
5.2 A one-dimensional model problem 

As a first numerical example we consider Brownian dynamics under a one-dimensional potential 
V{x), i.e. 

dx = -V'{x)dt + dw{t), (56) 
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Figure 6: Displayed are two posterior PDFs for measurement error variance i? = 36 as obtained 
from the Fokker-Planck approach. A distinct bimodal behavior can be observed which motivates 
the use of a binary Gaussian mixture model for the EGMF. 



Fokker-Planck filter estimate, R = 36 Fokker-Planck filter estimate, R = 4 




2000 4000 6000 8000 10000 2000 4000 6000 8000 10000 



Figure 7: Estimated ensemble mean computed from a direct simulation of the assimilation 
problem using a discretized Fokker-Planck equation for measurement error variance R = 36 
(left panel) and R = 4 (right panel). 




Figure 8: Ensemble mean from the EGMF for measurement error variance R = 36 (left panel) 
and R = 4 (right panel). It can be observed that the EGMF leads to results similar to those 
from the discrete Fokker-Planck approach (Fig. ItI). 
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Figure 9: Ensemble mean from a RHF for measurement error variance R = 3Q (left panel) and 
R = 4 (right panel). The results are for M = 50 particles. It can be observed that the RHF 
leads to results similar to those from the discrete Fokker-Planck approach (Fig. ItI). 




Figure 10: Ensemble mean from an EnKF with perturbed observations for measurement error 
variance R = 36 (left panel) and R = 4 (right panel). The results are for M = 50 particles. 
The EnKF behaves not as well as the RHF, the EGMF, and the discretized Fokker-Planck 
approach. Similar results are obtained for an ensemble square root filter. 
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where w{t) denotes standard Brownian motion and the potential is given by 



V{x) =cos{x) + ^{x/6Y. (57) 



See Fig. |4j The true reference trajectory is started at x(0) = —3.14. See Fig. |5} Measurements 
of x{t) are collected every 10 time units with two different values of the measurement error 
variance R {R = 4, 36). 

The initial PDF is given by the bimodal distribution 

Depending on the distribution of ensemble members we either use a single Gaussian [L = 1) 
or a bi-Gaussian mixture (L = 2) in the EGMF assimilation step. A single Gaussian is used 
whenever more than 90% of the particles are located near either the right (i.e. Xi > 0) or 
left potential well (i.e. Xi < 0). The computed variances are modified such that af > 0.0005 



to avoid singularities in the EM algorithm. The model equation (56) is discretized with the 



forward Euler method and time-step At = 0.1. The total simulation interval is t G [0, 100000]. 



The assimilation equations with (42) and (43) are discretized with the forward Euler method 
and step-size As = 0.05. The /oo-norm of mb(x, s) is limited to a value of Ucut = 5/As. 

The performance of the EGMF is compared to an EnKF with perturbed observations, an 
ensemble square root filter, and a RHF. The particle positions are adjusted during each data 
assimilation step of a RHF such that the particles maintain equal weights 7^ = 1/M. The 



adjustment is done similar to what has been proposed by Anderson (2010) except that the 
posterior is approximated by piecewise constant functions. 

For this simple, one-dimensional problem the densities can be directly propagated through a 
discretization of the associated Fokker-Planck equation. Bayes theorem reduces to a multiplica- 
tion of the prior PDF approximation from the Fokker-Planck approximation with the likelihood 
at each grid point. We have used a grid with mesh-size Ax = 0.125 over x G [—10, 10] to pro- 
vide an independent and accurate filtering result. Periodic boundary conditions are used for the 
diffusion operator such that the spatial discretization leads to a stochastic matrix. It is found 
from the numerical simulations that R = 36 leads to a pronounced bimodal behavior of the 
solution PDF tt. See Figure |6] for two posterior PDF approximations from the Fokker-Planck 
approach. 

Typical filter behaviors over the time interval t G [0, 10000] with regard to the reference 



trajectory can be found in Figures ^ pi and 10, respectively, for M = 50 ensemble members. 



The EGMF and the RHF display a behavior similar to that from the discretized Fokker-Planck 
approach while significantly different results are obtained from the EnKF implementation with 
perturbed observations. Similar results are obtained for an ensemble square root filter (not 
displayed). The EGMF uses a bi-Gaussian approximation in 97% of the assimilation steps for 
i? = 36 and in 47% of the assimilation steps for R = A. 

We also provide the root mean square (RMS) error between the computed mean from the 
three different filters and the mean computed from the Fokker-Planck approach in Table [l] for 
R = 36 and different ensemble sizes M. The RHF converges as M — )■ cxd to the solution of the 
Fokker-Planck approach for this one-dimensional model problem. The EGMF provides better 
results than the EnKF but does not converge since the limiting distributions are not exactly 
bi-Gaussian. Note that the EGMF should converge for M — )■ 00 and the number of mixture 
components sufficiently large. Note also that the RHF does not converge to the analytic filter 
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Table 1: RMS errors for ensemble means obtained from EnKF, RHF, and EGMF compared 
to the expected value computed by a Fokker-Planck discretization with error variance R = 36 
and M = 20, 50, 100 particles/ensemble members. 







RHF 


EGMF 


EnKF 


M 


= 20 


0.6551 


0.7683 


1.0283 


M 


= 50 


0.3717 


0.5127 


0.8798 


M 


= 100 


0.2691 


0.4033 


0.8412 



solution as M — )■ oo in case of more than one dynamic variable (i.e. > 1). See also the 
following example. 

5.3 A two-dimensional model problem 

We discuss another example from classical mechanics. The evolution of a particle with position 



g G M and velocity f G M is described by Langevin dynamics (Gardiner, 2004) with equations 
of motion 

dq = V dt, (59) 
dv = -V'{q)dt--fvdt + adw{t), (60) 

where the potential V{q) is given by 

V{q) = cos{q) + ^{q/6)\ (61) 

the friction coefficient is 7 = 0.25, w{t) denotes standard Brownian motion, and cr^ = 0.35. A 
reference solution, denoted by {qr(t),Vr(t)), is obtained for initial condition {qo,vo) = (1, 1) and 
a particular realization of w{t). 

Let us address the situation that the reference solution is not directly accessible to us and 
that instead we are only able to observe Q{t) subject to 

dQ{t) = Vr{t) dt + c^''^di{t), (62) 

where ^(t) denotes again standard Brownian motion and c = 0.2. In other words, we are 
effectively only able to observe particle velocities. 

We now combine the model equations and the observations within the ensemble Kalman- 
Bucy framework. The ensemble filter relies on the simultaneous propagation of an ensemble of 
solutions Xiif) = {qi{t) , Vi(t)) G M^, i = 1, . . . , M. In our experiment we set M = 50. The filter 



equations for an EGMF with a single Gaussian, ensemble Kalman-Bucy filter (Bergemann and 



Reich, 2011), respectively, become 



P V 

dqi = Vidt —{vidt + vdt — 2dQ{t)), (63) 

p 

dvi = -V'{q^)dt--fVidt + adwi{t) - ^{vidt + vdt-2dQ{t)) (64) 
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with mean 

M , M 



E^- ^"=XfE^' (65) 

and variance/covariance 

Pvv = ^{Vi-vf, Pg^ = ^_ ^ ^{qi - q)[vi - V). (66) 

j=l i= 

The equations are solved for each ensemble member with different realizations Wiit) of standard 



i=\ 1=1 



M , M 



Brownian motion and step-size At = 0.01. The observation interval in (62) is r = At. The 
extension of the EGMF to a Gaussian mixture with L = 2 is straightforward. One substitutes 

y = V, R = cj 1st, and ?/obs = ^Qitn) / with 

AQ(t„) = Vr{tn)^t + y^^n, ~ N{{}, 1), (67) 



into (42) and (43) and sets As = 1 in the numerical time-stepping procedure for the assimilation 
step. The Zoo-norm of mb(x, s) is limited to a value of Uc^t = 0.25/As. Assimilation is performed 
at every model time-step. We perform a total of two million time-steps/data assimilation cycles. 
In the same manner one can implement a RHF for this problem. 

The computed ensemble means q{t) and v{t) in comparison to the reference solution can be 



found in Fig. 11 for the continuous EGMF (using L = 1 and L = 2 mixture components as 
appropriate) and the ensemble Kalman-Bucy filter (continuous EGMF with L = 1). The root 
mean square (RMS) error with respect to the true reference solution is 2.3331 for the ensemble 
Kalman-Bucy filter and 1.9148 for the EGMF, which amounts to a relative improvement of 
about 20%. The EGMF uses a bi-Gaussian distribution in about 25% of the assimilation 



steps. For comparison we show the results from the RHF in Fig. 12 for M = 50 particles. To 
interpret the behavior of the RHF one needs to look at the potential energy function V{q) (see 
Fig. |4]). The RHF assimilation scheme apparently pushes the solutions occasionally into the 
flat side regions of the potential energy curve resulting in a relatively large RMS error of 3.9375. 
Qualitatively similar results are obtained for the RHF with M = 800 particles. Recall that we 
do observe velocities and not positions in this example and that the RHF uses the ensemble 
generated covariance matrix P to linearly regress filter increments onto state space. 



5.4 Lorenz-63 model 

The three variable model 



x = 10{y~x), y = x{28 — z) — y, z 



xy z 

^ 3 



(68) 



of Lorenz ( 1963 ) is used as a final test for the EGMF method. Only the x variable is observed 



every 0.20 time units with an observational error drawn from a normal distribution with mean 
zero and variance eight. The model time-step is At = 0.01. A total of 101000 assimilation steps 
is performed for each experiment with only the last 100000 steps being used for the computation 
of RMS errors. We have implemented an ensemble square root filter, a RHF, and an EGMF 
using formulation (45)-(46) with B = cP, P the empirical covariance matrix of the ensemble. 



The parameter c is chosen from the interval c G [0.4, 1.0]. The number of ensemble members 
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Figure 11: The reference solution qr{t) (left panel) and the estimated (ensemble mean) solution 
from the continuous EGMF (right panel) over the first quarter of the simulation interval t G 
[0, 20000]. The estimated solution mostly follows the reference solution with the exception of a 
number of missed transitions. 



is set to M = 25 and no covariance localization is applied. The internal assimilation step-size 
is As = 1/4 and the Zoo-norm of mb(x, s) is limited to a value of -Ucut = 0.125/As. We have 
computed the RMS errors for ensemble inflation factors between 1.0 and 1.3 and only report 
the optimal results in Fig. 13 as a function of c for the EGMF. The overall smallest RMS error 
is achieved for c = 0.6 with a value of 4.1114. The associated RMS errors for the ensemble 
square root filter are 4.4813 and 4.6596 for the RHF, respectively. An increase in the number 
of ensemble members to M = 100 leads to a reduction in the RMS error for the RHF to 4.3276 
while the ensemble square root filter yields its optimal performance for M = 50 with a RMS 
error of 4.3785. Both values are significantly larger than the optimal RMS error for the EGMF 
with M = 25. A better performance is observed for the EnKF with perturbed observations 
and M = 25 for which we obtain 4.1775 as the smallest RMS error which, in fact, is close to 
the performance of the EGMF with c = 1. 



6 Summary 



We have extended the popular EnKF to statistical models provided by Gaussian mixtures. 
The EGMF is derived using a continuous reformulation of the Bayesian analysis step and 
consists of a combination of EnKF steps for each mixture component and an exchange term. 
The exchange term is determined for each measurement by a scalar elliptic PDE, which can 
be solved analytically. We have demonstrated by means of two numerical examples that the 
EGMF performs well when bimodal PDFs arise naturally from the model dynamics. The EGMF 
provides a valuable and easy to implement alternative to sequential Monte Carlo methods and 
other nonlinear filter algorithms. In this paper, we have used the standard EM algorithm to 
assign Gaussian mixture models to ensemble predictions. More refined methods such as those 



discussed by Fraley and Raftery (2007) will be considered in future work in order to provide 
a robust and accurate clustering of ensemble predictions. Alternatively, one can implement 
the EGMF with a Gaussian kernel density estimator. In this case, the empirical covariance 



matrix of the ensemble can be used as a base for kernel bandwidth selection (Wand and Jones 



1995). With this choice, the EGMF becomes closely related to the RHF of Anderson (2010). 



The feasibility of our approach has been demonstrated for the Lorenz-63 model. Further work 
is required to assess the merits of Gaussian kernel density estimators in comparison to EnKF 
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Figure 12: The estimated (ensemble mean) solution from the ensemble Kalman-Bucy filter over 
a quarter of the simulation interval is displayed in the left panel. The results look qualitatively 
similar to the results from the EGMF filter. However, in terms of RMS errors, the EGMF 
outperforms the ensemble Kalman-Bucy filter by about 20% over the complete simulation 
interval. The right panel displays the estimated (ensemble mean) solution from the RHF (left 
panel). The reader should note the enlarged range of the vertical axis. At several instances 
the filtered solution strongly deviates from the reference solution. To interpret this behavior 
we need to have a closer look at the potential energy V{q) (compare Fig. |4]). Apparently the 
RHF interprets the data as corresponding to solutions with positions in the flat side regions of 
the potential energy function. 



and RHF implementations for high dimensional systems. Encouraging results have also been 
reported by Stordal et al. (2011) for their related adaptive Gaussian mixture filter applied to 



the Lorenz-96 model (Lorenz, 1996 Lorenz and Emanuel 1998) 
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